OcrV1, Main, Exploration, bibRecord, 000D23

Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval

Identifieur interne : 000D23 ( Main/Exploration ); précédent : 000D22; suivant : 000D24

Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval

Auteurs : SHIJIAN LU [Singapour] ; LINLIN LI [Singapour] ; CHEW LIM TAN [Singapour]

Source :

IEEE transactions on pattern analysis and machine intelligence [ 0162-8828 ] ; 2008.

RBID : Pascal:09-0012148

Descripteurs français

Pascal (Inist)
- Intelligence artificielle, Analyse forme, Recherche documentaire, Recherche image, Recherche information, Reconnaissance image, Image optique, Reconnaissance optique caractère, Reconnaissance caractère, Traitement image, Topologie, Interrogation base donnée, Analyse documentaire, Analyse image, Mesure forme, Annotation, Mot clé, Type document.
Wicri :
- topic : Intelligence artificielle, Recherche documentaire.

English descriptors

KwdEn :
- Annotation, Artificial intelligence, Character recognition, Database query, Document analysis, Document retrieval, Document types, Image analysis, Image processing, Image recognition, Image retrieval, Information retrieval, Keyword, Optical character recognition, Optical image, Pattern analysis, Shape measurement, Topology.

Abstract

This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000242
to stream PascalFrancis, to step Curation: 000537
to stream PascalFrancis, to step Checkpoint: 000245
to stream Main, to step Merge: 000D35
to stream Main, to step Curation: 000D23

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval</title>
<author><name sortKey="Shijian Lu" sort="Shijian Lu" uniqKey="Shijian Lu" last="Shijian Lu">SHIJIAN LU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 21 Heng Mui Keng Terrace</s1>
<s2>Singapore, 119613</s2>
<s3>SGP</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore, 119613</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Linlin Li" sort="Linlin Li" uniqKey="Linlin Li" last="Linlin Li">LINLIN LI</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author><name sortKey="Chew Lim Tan" sort="Chew Lim Tan" uniqKey="Chew Lim Tan" last="Chew Lim Tan">CHEW LIM TAN</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">09-0012148</idno>
<date when="2008">2008</date>
<idno type="stanalyst">PASCAL 09-0012148 INIST</idno>
<idno type="RBID">Pascal:09-0012148</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000242</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000537</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000245</idno>
<idno type="wicri:doubleKey">0162-8828:2008:Shijian Lu:document:image:retrieval</idno>
<idno type="wicri:Area/Main/Merge">000D35</idno>
<idno type="wicri:Area/Main/Curation">000D23</idno>
<idno type="wicri:Area/Main/Exploration">000D23</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval</title>
<author><name sortKey="Shijian Lu" sort="Shijian Lu" uniqKey="Shijian Lu" last="Shijian Lu">SHIJIAN LU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 21 Heng Mui Keng Terrace</s1>
<s2>Singapore, 119613</s2>
<s3>SGP</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore, 119613</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Linlin Li" sort="Linlin Li" uniqKey="Linlin Li" last="Linlin Li">LINLIN LI</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
<author><name sortKey="Chew Lim Tan" sort="Chew Lim Tan" uniqKey="Chew Lim Tan" last="Chew Lim Tan">CHEW LIM TAN</name>
<affiliation wicri:level="4"><inist:fA14 i1="02"><s1>Department of Computer Science, School of Computing, National University of Singapore, 3 Science Drive 2</s1>
<s2>Singapore 117543</s2>
<s3>SGP</s3>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Singapour</country>
<wicri:noRegion>Singapore 117543</wicri:noRegion>
<orgName type="university">Université nationale de Singapour</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
<imprint><date when="2008">2008</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Annotation</term>
<term>Artificial intelligence</term>
<term>Character recognition</term>
<term>Database query</term>
<term>Document analysis</term>
<term>Document retrieval</term>
<term>Document types</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Image retrieval</term>
<term>Information retrieval</term>
<term>Keyword</term>
<term>Optical character recognition</term>
<term>Optical image</term>
<term>Pattern analysis</term>
<term>Shape measurement</term>
<term>Topology</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Intelligence artificielle</term>
<term>Analyse forme</term>
<term>Recherche documentaire</term>
<term>Recherche image</term>
<term>Recherche information</term>
<term>Reconnaissance image</term>
<term>Image optique</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Traitement image</term>
<term>Topologie</term>
<term>Interrogation base donnée</term>
<term>Analyse documentaire</term>
<term>Analyse image</term>
<term>Mesure forme</term>
<term>Annotation</term>
<term>Mot clé</term>
<term>Type document</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Intelligence artificielle</term>
<term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents a document retrieval technique that is capable of searching document images without optical character recognition (OCR). The proposed technique retrieves document images by a new word shape coding scheme, which captures the document content through annotating each word image by a word shape code. In particular, we annotate word images by using a set of topological shape features including character ascenders/descenders, character holes, and character water reservoirs. With the annotated word shape codes, document images can be retrieved by either query keywords or a query document image. Experimental results show that the proposed document image retrieval technique is fast, efficient, and tolerant to various types of document degradation.</div>
</front>
</TEI>
<affiliations><list><country><li>Singapour</li>
</country>
<orgName><li>Université nationale de Singapour</li>
</orgName>
</list>
<tree><country name="Singapour"><noRegion><name sortKey="Shijian Lu" sort="Shijian Lu" uniqKey="Shijian Lu" last="Shijian Lu">SHIJIAN LU</name>
</noRegion>
<name sortKey="Chew Lim Tan" sort="Chew Lim Tan" uniqKey="Chew Lim Tan" last="Chew Lim Tan">CHEW LIM TAN</name>
<name sortKey="Linlin Li" sort="Linlin Li" uniqKey="Linlin Li" last="Linlin Li">LINLIN LI</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D23 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D23 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:09-0012148
   |texte=   Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval

Document Image Retrieval through Word Shape Coding : Real-world image annotation and retrieval

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri